Extracting information from informal communication
نویسنده
چکیده
This thesis focuses on the problem of extracting information from informal communication. Textual informal communication, such as e-mail, bulletin boards and blogs, has become a vast information resource. However, such information is poorly organized and difficult for a computer to understand due to lack of editing and structure. Thus, techniques which work well for formal text, such as newspaper articles, may be considered insufficient on informal text. One focus of ours is to attempt to advance the state-of-the-art for sub-problems of the information extraction task. We make contributions to the problems of named entity extraction, co-reference resolution and context tracking. We channel our efforts toward methods which are particularly applicable to informal communication. We also consider a type of information which is somewhat unique to informal communication: preferences and opinions. Individuals often expression their opinions on products and services in such communication. Others’ may read these “reviews” to try to predict their own experiences. However, humans do a poor job of aggregating and generalizing large sets of data. We develop techniques that can perform the job of predicting unobserved opinions. We address both the single-user case where information about the items is known, and the multi-user case where we can generalize opinions without external information. Experiments on largescale rating data sets validate our approach. Thesis Supervisor: Tommi Jaakkola Title: Associate Professor of Electrical Engineering and Computer Science
منابع مشابه
Extracting Information from Informal Communication by Jason
This thesis focuses on the problem of extracting information from informal communication. Textual informal communication, such as e-mail, bulletin boards and blogs, has become a vast information resource. However, such information is poorly organized and difficult for a computer to understand due to lack of editing and structure. Thus, techniques which work well for formal text, such as newspap...
متن کاملExtracting Semantic User Networks from Informal Communication Exchanges
Nowadays communication exchanges are an integral and time consuming part of people’s job, especially for the so called knowledge workers.
متن کاملApplication of Big Data Analytics in Power Distribution Network
Smart grid enhances optimization in generation, distribution and consumption of the electricity by integrating information and communication technologies into the grid. Today, utilities are moving towards smart grid applications, most common one being deployment of smart meters in advanced metering infrastructure, and the first technical challenge they face is the huge volume of data generated ...
متن کاملAn Investigation of Gender Differences Between Women’s and Men’s Informal Discussion in Iranian EFL Context
The language used by women and men differ in all speech communities. In order to examine some of these variations, the present study aimed to investigate the differences in an informal written discourse. For this purpose, a comparison between men’s and women’s informal language was made regarding length of utterances, questions, intensifiers, and hedges. Results revealed that men employed highe...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کامل